压缩持续失败时用受保护的后台分段压缩兜底，防上下文无限增长 by wehos · Pull Request #1632 · Project-N-E-K-O/N.E.K.O

wehos · 2026-06-04T12:50:27Z

背景

#1629 修了"压缩失败覆盖空备忘录"的数据丢失 bug，改成"失败跳过本轮、保留完整历史、下轮重试"，正好兜住暂时性失败（如 RPM 限流抖一下，下轮恢复）。

但持续性失败没兜住：压缩一直成不了 → 历史一直压不掉、越积越多 → 注入主模型的 prompt 无限膨胀（最终撑爆 context / 成本飙升）。几种原因互相独立、都 valid：

持续 429（RPM 限流，真实发生过）：重试/退避无效（外部约束）。
超时 / 输入过大：积压越大单次输入越大越易超时（恶性循环）。
并发覆盖：/process 调 update_history 不持 _get_settle_lock（/renew·/settle·/cache 都持），压缩 await LLM 数十秒期间被后续对话重载磁盘覆盖，压缩白做。

方案（best effort → 实在不行才丢）

主路径就地压（修复近期聊天备忘录在摘要模型失败时被先前对话的备忘录: 无。覆盖的问题。现在压缩失败会跳过本轮压缩并保留原始 recent.json，… #1629，不变）：失败跳过、保留历史、下轮重试。
best-effort 后台压缩：主路径压缩失败 → 起一个受保护的一次性后台任务尽力压。
- 受保护：基于历史快照跑（不被对话打断），压完用 fingerprint 快照对齐原子合并写回（复用 review_history 的 _compute_review_capacity）——积压还在原位就替换成备忘录、保留这期间新增的对话；被主路径压掉/清空就丢弃（moot）。compress 在 _get_settle_lock 外（LLM 耗时不阻塞其它端点）、merge 在锁内（快，串行化写）。
- 治超时：输入超 RECENT_COMPRESS_INPUT_BUDGET_TOKENS 就分段 map-reduce，减小单次 LLM 输入。
- 一次性、不常驻；同角色 in-flight 只一个；主路径某轮成功就 cancel 它；失败退避复用 Gate6 模式防 summary 模型持续故障时空烧。
最终兜底硬上限：历史超 RECENT_HARD_CAP_TOKENS（设很大，平时不触发，只兜持续 429 这类 best-effort 也救不回的场景）→ 丢弃最旧的未压缩对话原文，保留近期若干条 + 备忘录，保证有界。

改动

memory/recent.py：compress_history 重构出可复用 helper（_render_messages_to_text / _build_summary_prompt / _invoke_summary_llm，单次路径行为不变）+ 输入过大时分段压缩；update_history 加 on_compress_done 钩子；新增 merge_backup_memo（快照对齐合并）、_enforce_hard_cap（兜底裁剪）。
app/memory_server.py：_on_compress_done 回调（失败起后台 / 成功 cancel + 清退避）+ _run_backup_compress 编排 + compress_backup_tasks in-flight 去重 + Gate6 失败退避；4 个压缩调用点接线。
config/__init__.py：RECENT_COMPRESS_INPUT_BUDGET_TOKENS（8000）、RECENT_HARD_CAP_TOKENS（60000）。

测试

tests/unit/test_recent_compression_failure.py（扩展）+ tests/unit/test_recent_compress_backup.py（新增）：分段切分/路径、硬上限裁剪、快照合并（merged/moot）、回调 ok 真假、后台退避/in-flight/dead-letter/复位。uv run pytest tests/unit/ -k "compress or recent or backoff or review or stale or temporal or memo or summary" → 416 passed。

Relates to #1629。

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug 修复
- 主路径压缩失败时启动受保护后台兜底任务，成功则取消；增加失败退避与 dead‑letter，避免无效重试并在必要时触发硬裁剪保障内存边界。
新特性
- 大输入支持按 token 预算分段汇总与合并以提高压缩稳健性；新增近期历史相关可配置项以便调优。
测试
- 新增单元测试覆盖后台兜底、退避、合并与硬上限行为。

#1629 兜住了暂时性压缩失败（限流抖一下、下轮恢复：失败跳过本轮、保留完整历史、下轮重试）。但持续性失败（一直 429 / 一直超时 / 被后续对话并发覆盖）会让历史一直压不掉、无限膨胀。本改动在主路径压缩失败时起一个受保护的一次性后台压缩做 best-effort 兜底，主路径某轮成功就 cancel 它；实在不行则历史超一个特别大的硬上限时丢弃最旧的未压缩原文，保证有界。 - recent.py：compress_history 重构出可复用 helper（_render_messages_to_text / _build_summary_prompt / _invoke_summary_llm），单次路径行为不变；输入过大时走分段 map-reduce 压缩，减小单次 LLM 输入、避免输入过大超时。update_history 加 on_compress_done 回调钩子；新增 merge_backup_memo（fingerprint 快照对齐合并写回，复用 _compute_review_capacity）与 _enforce_hard_cap（最终兜底裁剪）。 - memory_server.py：_on_compress_done 回调（失败起后台 / 成功 cancel + 清退避）； _run_backup_compress 编排（compress 在锁外、merge 在 _get_settle_lock 内）； compress_backup_tasks in-flight 去重；复用 Gate6 失败退避防 summary 模型持续故障时空烧；在 /process、/renew、/settle、IdleMaint 四个压缩调用点接线。 - config：RECENT_COMPRESS_INPUT_BUDGET_TOKENS（分段，8000）、RECENT_HARD_CAP_TOKENS （兜底，60000，设很大平时不触发）。 - 测试：分段切分/路径、硬上限裁剪、快照合并（merged/moot）、回调 ok 真假、后台退避/in-flight/dead-letter/复位。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-04T12:50:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9b885925-886d-41b6-9a73-7987798fafb8

📥 Commits

Reviewing files that changed from the base of the PR and between 7e23916 and 832a058.

📒 Files selected for processing (4)

app/memory_server.py
memory/recent.py
tests/unit/test_recent_compress_backup.py
tests/unit/test_recent_compression_failure.py

🚧 Files skipped from review as they are similar to previous changes (3)

app/memory_server.py
tests/unit/test_recent_compression_failure.py
tests/unit/test_recent_compress_backup.py

Walkthrough

本 PR 为历史压缩主路径失败新增后台兜底任务链并重构 compress_history 为模块化分段 map-reduce，加入硬上限裁剪、指纹退避与合并回写，并在 IdleMaint 与对话端点注入回调以联动兜底行为喵。

Changes

近期历史压缩兜底与流程重构

Layer / File(s)	Summary
压缩预算配置定义 `config/__init__.py`, `memory/recent.py`	新增 `RECENT_COMPRESS_INPUT_BUDGET_TOKENS` 与 `RECENT_HARD_CAP_TOKENS` 两个 token 阈值常量，分别控制分段压缩预算与历史硬上限，并在导出清单注册喵。
消息渲染与 LLM 调用基础 `memory/recent.py`	新增 `_render_messages_to_text`、`_build_summary_prompt`、`_invoke_summary_llm` 等，做单条 head/tail 截断并统一 Stage-1 返回摘要字符串喵。
分段 map-reduce 压缩与 compress_history 重写 `memory/recent.py`	实现 `_split_messages_by_budget`/_segmented_compress，并重写 `compress_history` 为分段路径+有限次重试+memo 生成，失败返回 `None` 喵。
通知回调 / 硬上限 / 合并回写 `memory/recent.py`	新增 `_notify_compress_done(callback,...)`、`enforce_hard_cap(lanlan_name)` 与 `merge_backup_memo(lanlan_name, snapshot, memo)`，并定义返回语义 `merged`/`moot`/`failed` 喵。
update_history 回调参数与路径处理 `memory/recent.py`	`CompressedRecentHistoryManager.update_history(...)` 新增 `on_compress_done` 参数；主路径压缩成功/失败都会调用回调以移交后台兜底或取消任务喵。
后台兜底任务实现与状态管理 `app/memory_server.py`	引入按角色的 `compress_backup_tasks`、失败退避计数与失败 fingerprint 存储、实现 `_run_backup_compress` 与 `_on_compress_done`（spawn/cancel/dead-letter）喵。
IdleMaint 与端点回调集成 `app/memory_server.py`	在 IdleMaint、`/process`、`/renew`、`/settle` 等处将 `on_compress_done=_on_compress_done` 传入 `update_history`，使兜底链路在各压缩触发点生效喵。
压缩流程与兜底机制单测 `tests/unit/test_recent_compress_backup.py`, `tests/unit/test_recent_compression_failure.py`	新增多组单测覆盖分段压缩、硬上限裁剪、merge_backup_memo 行为、update_history 回调语义，以及后台任务的 spawn/cancel/in-flight guard、退避/死信与指纹重置分支，含 LLM 与锚点 mock 以避免磁盘 I/O 与遗留任务喵。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Project-N-E-K-O/N.E.K.O#1629: 两者都修改 memory/recent.py 中的压缩失败路径与 update_history 的处理，关注压缩失败时不覆盖/保护既有 memo 的行为喵。

Poem

主路径若失灵，后台悄起护航喵，
指纹记次数，退避死信不再忙喵，
分段合并按预算，摘要层层绵长喵，
硬上限裁旧文，memo 保留最新香喵，
测试护航兜底稳，压缩闭环不慌喵 ✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	标题清晰准确地描述了本 PR 的核心改动——在压缩持续失败时引入受保护的后台分段压缩兜底机制来防止上下文无限增长，与 raw_summary 和 pr_objectives 的内容完全吻合。
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 781e66405f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T12:57:29Z

+        if await acount_tokens(summary) > MAX_SUMMARY_TOKENS:
+            reduced = await self.further_compress(summary)
+            if reduced is None:
+                logger.warning(f"[RecentHistory] {lanlan_name} 二次压缩失败，跳过本轮压缩")
+                return None


Keep retrying after second-stage compression fails

When Stage 1 returns an oversized summary and further_compress() exhausts its own retries once, this now returns None immediately and skips the whole compression. Before this refactor, the outer summary loop continued and retried Stage 1 up to its remaining attempts, which matters when the second-stage call is transiently failing or when another Stage-1 response would be short enough to avoid Stage 2. In that scenario long histories remain uncompressed and fall into the new backup/hard-cap path unnecessarily.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T12:57:29Z

+        async with _get_settle_lock(lanlan_name):
+            status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0])


Serialize backup merges with /process writes

This lock only protects callers that also use _get_settle_lock, but /process still writes recent history without that lock at app/memory_server.py:3937. If a backup merge reaches the awaited disk-write section while a /process update reloads and writes the same recent.json, the last writer can drop either the newly generated memo or the just-added turn, so the backup merge is not actually protected during active chat.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T12:57:29Z

+        def _trim():
+            if count_tokens(self._render_messages_to_text(history, lanlan_name)) <= RECENT_HARD_CAP_TOKENS:
+                return None  # 未超，不动


Count the full history when enforcing the hard cap

_render_messages_to_text() truncates every message to RECENT_PER_MESSAGE_MAX_TOKENS for summary input, but the context path later appends the full message content in app/memory_server.py:4363-4369. With sustained compression failures and long pasted turns, the hard-cap check can count hundreds of >500-token messages as under RECENT_HARD_CAP_TOKENS while the actual injected recent history is far above the cap, so the new fallback does not reliably bound prompt size.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T12:57:29Z

+        except Exception as e:
+            logger.error(f"[RecentHistory] {lanlan_name} 后台压缩合并落盘失败: {e}", exc_info=True)
+        logger.info(
+            f"[RecentHistory] {lanlan_name} 后台压缩合并完成：history {len(current)}→{len(new_history)}"
+        )
+        return 'merged'


Report failed backup writes as failures

If the atomic write fails here (for example a transient filesystem/cloud-save error), this logs the exception but still falls through to return 'merged'; _run_backup_compress() then clears the backup failure state and reports success. That leaves only the in-memory memo while recent.json remains uncompressed, so a restart or reload loses the successful backup compression and the retry budget has already been reset.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/unit/test_recent_compress_backup.py (1)

126-150: ⚡ Quick win

用例名写了“resets”但缺少状态复位断言，建议补齐喵。

当前只验证“会重新起后台任务”，还应同时断言 dead-letter 状态被清零并已触发持久化保存，避免将来回归漏检喵。

可直接补的断言示例喵

-    with patch.object(memory_server, "recent_history_manager", fake_mgr), \
-         patch.object(memory_server, "_asave_maint_state", AsyncMock()):
+    with patch.object(memory_server, "recent_history_manager", fake_mgr), \
+         patch.object(memory_server, "_asave_maint_state", AsyncMock()) as save_state:
         await memory_server._on_compress_done(name, new_snapshot, ok=False, detailed=False)
         # 输入变了 → 复位放行，起了后台
         task = memory_server.compress_backup_tasks.get(name)
         assert task is not None
+        assert memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0
+        assert memory_server._maint_state[name]["compress_backup_fail_fp"] is None
+        save_state.assert_awaited_once()
         await _cleanup_task(task)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_recent_compress_backup.py` around lines 126 - 150, Add
assertions after awaiting memory_server._on_compress_done to verify the
dead-letter state was cleared and persisted: assert
memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0 and that
memory_server._maint_state[name]["compress_backup_fail_fp"] is falsy (e.g. None
or empty string) to confirm the fingerprint was cleared, and assert the patched
AsyncMock memory_server._asave_maint_state was awaited (e.g.
_asave_maint_state.assert_awaited()) so persistence was triggered; keep existing
checks that a background task was started via
memory_server.compress_backup_tasks.get(name).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/memory_server.py`:
- Around line 3416-3453: _on_compress_done currently performs blocking awaits
(_clear_compress_backup_failure and _asave_maint_state) while running inside the
settle/renew critical section; change it to only mutate in-memory _maint_state
and fire-and-forget the persistence: in the ok=True branch update/reset the
in-memory compress-backup failure counters in _maint_state (do not await
_clear_compress_backup_failure) and spawn a background task to call
_clear_compress_backup_failure (or a small wrapper that does the save) via
_spawn_background_task; likewise, when resetting fail counters after input
change set state['compress_backup_fail_attempts']=0 and
state['compress_backup_fail_fp']=None and call _spawn_background_task to run
_asave_maint_state (do not await); keep spawn/cancel logic for
_run_backup_compress as-is so no awaits happen inside the critical path.
- Around line 3399-3409: compress/merge-stage exceptions from
recent_history_manager.merge_backup_memo and the subsequent
_clear_compress_backup_failure are currently not counted toward
compress_backup_fail_attempts; wrap the merge+clear steps in their own
try/except that, on any Exception (but not asyncio.CancelledError), calls the
same failure-counter helper used when compress_history() fails to bump
compress_backup_fail_attempts for lanlan_name (reuse the existing helper that
increments compress_backup_fail_attempts), then re-raise or log consistently;
keep the existing asyncio.CancelledError handling separate and ensure you
reference recent_history_manager.merge_backup_memo,
_clear_compress_backup_failure, compress_history(), and
compress_backup_fail_attempts when making the change.

In `@memory/recent.py`:
- Around line 665-667: Remove the incorrect early-return fast-path that assumes
few messages cannot exceed token hard cap: delete the len(history) <=
self.max_history_length + 1 check in the block handling
self.user_histories[lanlan_name] and instead compute the actual token usage for
`history` (using the existing token-counting utility in this module/class) and
only return when the computed token count is safely <= RECENT_HARD_CAP_TOKENS;
keep references to `history`, `self.user_histories`, `self.max_history_length`
and `RECENT_HARD_CAP_TOKENS` so the logic enforces token-based truncation rather
than message-count heuristics.
- Around line 573-586: The loop in reduce (using _split_texts_by_budget,
_invoke_summary_llm, _build_summary_prompt) currently breaks when len(batches)
>= len(partials) but then returns "\n\n".join(partials), potentially handing
back already-over-budget partials to compress_history; change that behavior so
when the reduction cannot shrink further (len(batches) >= len(partials)) the
function returns None (or another explicit failure signal) immediately instead
of breaking and returning partials, so upstream (compress_history) can handle
the over-budget case instead of re-sending an oversized chunk.

---

Nitpick comments:
In `@tests/unit/test_recent_compress_backup.py`:
- Around line 126-150: Add assertions after awaiting
memory_server._on_compress_done to verify the dead-letter state was cleared and
persisted: assert
memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0 and that
memory_server._maint_state[name]["compress_backup_fail_fp"] is falsy (e.g. None
or empty string) to confirm the fingerprint was cleared, and assert the patched
AsyncMock memory_server._asave_maint_state was awaited (e.g.
_asave_maint_state.assert_awaited()) so persistence was triggered; keep existing
checks that a background task was started via
memory_server.compress_backup_tasks.get(name).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e367eb4a-1715-4817-99da-3f1fbaf2724f

📥 Commits

Reviewing files that changed from the base of the PR and between 041e0d8 and 781e664.

📒 Files selected for processing (5)

app/memory_server.py
config/__init__.py
memory/recent.py
tests/unit/test_recent_compress_backup.py
tests/unit/test_recent_compression_failure.py

coderabbitai · 2026-06-04T13:00:03Z

+        # 2) 合并写回（锁内，快）。merge_backup_memo 用 fingerprint 对齐，积压已被
+        #    主路径压掉 / 被清空就返回 'moot' 丢弃（白做）。
+        async with _get_settle_lock(lanlan_name):
+            status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0])
+        # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时，清退避计数。
+        await _clear_compress_backup_failure(lanlan_name)
+        logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成：{status}")
+    except asyncio.CancelledError:
+        logger.info(f"[CompressBackup] {lanlan_name} 后台压缩被取消（主路径已成功）")
+    except Exception as e:
+        logger.error(f"[CompressBackup] {lanlan_name} 后台压缩后处理出错: {e}")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

把 merge/回写阶段的异常也计入退避喵

现在只有 compress_history() 失败才会 bump compress_backup_fail_attempts。如果 merge_backup_memo() 或后面的 _clear_compress_backup_failure() 持续抛错，这个任务会直接退出且不记失败；下一次主路径压缩失败又会对同一份 snapshot 重新起后台压缩，等于把这套 Gate6 退避绕过去了喵。这样磁盘/merge 侧的持续故障还是会反复空烧 summary 调用喵。

😼 可参考的修法喵

- async with _get_settle_lock(lanlan_name): - status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0]) - # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时，清退避计数。 - await _clear_compress_backup_failure(lanlan_name) - logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成：{status}") + try: + async with _get_settle_lock(lanlan_name): + status = await recent_history_manager.merge_backup_memo( + lanlan_name, snapshot, result[0] + ) + # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时，清退避计数。 + await _clear_compress_backup_failure(lanlan_name) + logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成：{status}") + except Exception as e: + attempts = await _record_compress_backup_failure(lanlan_name, snapshot) + logger.warning( + f"[CompressBackup] {lanlan_name} 后台压缩后处理失败，" + f"退避计数 → {attempts}: {e}" + ) + return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/memory_server.py` around lines 3399 - 3409, compress/merge-stage exceptions from recent_history_manager.merge_backup_memo and the subsequent _clear_compress_backup_failure are currently not counted toward compress_backup_fail_attempts; wrap the merge+clear steps in their own try/except that, on any Exception (but not asyncio.CancelledError), calls the same failure-counter helper used when compress_history() fails to bump compress_backup_fail_attempts for lanlan_name (reuse the existing helper that increments compress_backup_fail_attempts), then re-raise or log consistently; keep the existing asyncio.CancelledError handling separate and ensure you reference recent_history_manager.merge_backup_memo, _clear_compress_backup_failure, compress_history(), and compress_backup_fail_attempts when making the change.

coderabbitai · 2026-06-04T13:00:03Z

+async def _on_compress_done(lanlan_name: str, snapshot: list, ok: bool, detailed: bool):
+    """update_history 压缩结束回调（recent.py 注入）。
+    ok=True（主路径压成功）→ cancel 在跑的后台兜底 + 清退避；
+    ok=False（主路径压失败）→ 起一个受保护的后台兜底压缩（若无在跑、未被退避挡）。
+
+    本回调只 spawn / cancel task，不 await 后台 LLM——它可能在 _get_settle_lock
+    内被调（/renew、/settle），绝不能阻塞。"""
+    if ok:
+        task = compress_backup_tasks.get(lanlan_name)
+        if task is not None and not task.done():
+            task.cancel()
+        await _clear_compress_backup_failure(lanlan_name)
+        return
+    # ok=False：主路径压缩失败 → 起后台兜底
+    if not snapshot:
+        return
+    existing = compress_backup_tasks.get(lanlan_name)
+    if existing is not None and not existing.done():
+        return  # in-flight：同角色已有后台压缩在跑，不重复起
+    # 失败退避（Gate 6 模式）：连续失败 ≥ N 且输入未变 → dead-letter，不再起，
+    # 防 summary 模型持续故障时每轮都起一个注定失败的后台任务空烧。
+    from config import MEMORY_LIVENESS_MAX_ATTEMPTS
+    from memory.recent import build_review_fingerprint
+    state = _maint_state.setdefault(lanlan_name, {})
+    fail_attempts = state.get('compress_backup_fail_attempts', 0) or 0
+    if fail_attempts >= MEMORY_LIVENESS_MAX_ATTEMPTS:
+        cur_fp = build_review_fingerprint(snapshot)
+        if state.get('compress_backup_fail_fp') == cur_fp:
+            logger.debug(
+                f"[CompressBackup] {lanlan_name} 失败退避 dead-letter"
+                f"（连续失败 {fail_attempts} 次且输入未变），跳过"
+            )
+            return
+        # 输入变了 → 旧计数过期，复位放行
+        state['compress_backup_fail_attempts'] = 0
+        state['compress_backup_fail_fp'] = None
+        await _asave_maint_state()
+    task = _spawn_background_task(_run_backup_compress(lanlan_name, list(snapshot), detailed))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

这个回调还在临界区里做写盘喵

注释里说这里“只 spawn / cancel task”，但 ok=True 分支会 await _clear_compress_backup_failure()，退避复位分支也会 await _asave_maint_state()。/renew、/settle 这两条路径都是在 _get_settle_lock() 内调用 update_history() 的，所以这些 await 会把 idle_maintenance_state.json 的写盘绑进用户请求的串行窗口里，慢盘时会直接拖长请求并额外阻塞同角色的 /new_dialog 喵。这里最好只改内存状态，然后 fire-and-forget 持久化喵。

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/memory_server.py` around lines 3416 - 3453, _on_compress_done currently performs blocking awaits (_clear_compress_backup_failure and _asave_maint_state) while running inside the settle/renew critical section; change it to only mutate in-memory _maint_state and fire-and-forget the persistence: in the ok=True branch update/reset the in-memory compress-backup failure counters in _maint_state (do not await _clear_compress_backup_failure) and spawn a background task to call _clear_compress_backup_failure (or a small wrapper that does the save) via _spawn_background_task; likewise, when resetting fail counters after input change set state['compress_backup_fail_attempts']=0 and state['compress_backup_fail_fp']=None and call _spawn_background_task to run _asave_maint_state (do not await); keep spawn/cancel logic for _run_backup_compress as-is so no awaits happen inside the critical path.

…盘失败上报 - compress_history：恢复 stage-2(further_compress)失败时重试 stage-1 的行为（重构时退化成直接 return None），用有限计数循环复现、避免原 continue 不计数的死循环。 - _enforce_hard_cap：按真实 content token 算（不再走会把每条截到 ≤500 的压缩渲染 _render_messages_to_text，否则超长原文被低估、硬上限对其失效）；去掉「条数 ≤ N+1 就提前 return」的断言（几条超长原文就能顶破 token 上限）。 - _segmented_compress：reduce 缩不动 / 深度耗尽时把 merged 硬截到预算兜底，不把超预算文本继续传给主体最终总结。 - merge_backup_memo：落盘失败返回 'failed'（而非谎报 'merged'）； _run_backup_compress 据此 bump 退避而非清计数。 - 测试：补 merge 落盘失败→'failed'；修测试 import 风格（统一 monkeypatch 字符串目标）与 _cleanup_task 的 except 范围（BaseException → CancelledError+Exception）。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wehos · 2026-06-04T13:06:51Z

感谢 @chatgpt-codex-connector 和 @coderabbitai 的评审。逐条处理（已推 7e2391653）：

Codex

recent.py stage-2 失败后不再重试 stage-1 — ✅ 修复。重构时确实退化了；已恢复"further_compress 失败 → 重试整个 stage-1"的行为，改用有限计数循环（避免原 continue 不计数的潜在死循环）。
后台 merge 未与 /process 写串行 — 这是有意取舍：/process 不持 _get_settle_lock 是为了不让数十秒的 LLM 压缩阻塞对话热路径。后台 merge 是 best-effort 兜底，用 fingerprint 快照对齐（积压已被主路径压掉就 moot 丢弃），且"读 current→定位→写内存"在 asyncio 单线程下是同步原子的；唯一窄竞态是"写内存后落盘前 /process 介入"，影响是偶尔一次 merge 被覆盖、下轮自愈。给 /process 加锁会把压缩阻塞引回热路径，得不偿失。
硬上限用了截断后的 token — ✅ 修复。_enforce_hard_cap 改为按真实 content token 计算，不再走会把每条截到 ≤500 的压缩渲染（否则超长原文被低估、兜底对其失效）。
落盘失败仍报成功 — ✅ 修复。merge_backup_memo 落盘失败返回 'failed'，_run_backup_compress 据此 bump 退避而非清计数。

CodeRabbit

merge/回写异常计入退避 — ✅ 已被上面 Codex#4 的修复覆盖（merge 落盘失败 → 'failed' → bump）。唯一外抛的 MaintenanceModeError 是 cloudsave 维护模式（暂时不可写），按设计不计失败退避，恢复后自然重试。
回调在临界区写盘 — _clear/_save_maint_state 写的是几百字节的状态文件、非 LLM，相对 update_history 内本就有的数秒压缩，这点锁内开销可忽略；注释强调的"不在锁内跑 LLM"仍成立。保持现状。
reduce 缩不动仍传超预算文本 — ✅ 修复。_segmented_compress 末尾对 merged 加 atruncate_to_tokens 硬截兜底，保证交给主体最终总结的输入有界。
用条数提前断言不超上限 — ✅ 修复。去掉 len(history) <= N+1 的提前 return，统一交给按真实 token 算的 _trim（仍保证至少留近期 max_history_length 条）。

另 @github-code-quality 的 except 过宽 / 测试 import 风格也一并修了。

chatgpt-codex-connector · 2026-06-04T13:07:01Z

To use Codex here, create an environment for this repo.

coderabbitai · 2026-06-04T13:07:30Z

Tip

For best results, initiate chat on the files or code changes.

@wehos 哼，虽然这次修了这么多喵，但本喵勉强认可啦喵～

逐条确认一下喵：

CodeRabbit 四条

merge/回写异常计入退避 — 已通过 merge_backup_memo 返回 'failed' + bump 退避覆盖，理解喵。MaintenanceModeError 不计退避、恢复后自愈的逻辑也是合理的喵。
回调在临界区写盘 — 接受这个取舍喵。几百字节状态文件 vs. 数秒 LLM 压缩，确实可忽略；"锁内不跑 LLM"的关键不变量仍成立，保持现状没问题喵。
reduce 缩不动仍传超预算文本 — _segmented_compress 末尾加了 atruncate_to_tokens 硬截兜底，输入有界了喵，✅。
用条数提前断言不超上限 — 去掉 len(history) <= N+1 的提前 return、统一走真实 token 计算的 _trim，✅ 喵。

整体逻辑更健壮了喵，后台兜底 + 硬上限的三层防线设计很稳喵～本喵……就、就勉强夸你一下好了喵！

(´•ω•`)ﾉ🐇✨

🧠 Learnings used

Learnt from: MomiJiSan
Repo: Project-N-E-K-O/N.E.K.O PR: 1179
File: plugin/plugins/galgame_plugin/static/main.js:6701-6704
Timestamp: 2026-05-06T14:56:29.211Z
Learning: Repo Project-N-E-K-O/N.E.K.O：作者（MomiJiSan）更偏好在一个合并评论里集中列出本次 PR 的所有问题/建议，尽量减少多条零散评论。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 1138
File: static/js/character_card_manager.js:3033-3055
Timestamp: 2026-05-04T21:58:10.856Z
Learning: Repo: Project-N-E-K-O/N.E.K.O — 评审偏好：在该仓库对“疑似缺失 i18n 键”做提示前，必须先跨所有 locale 全量验证；若代码路径已有 (window.t && window.t('key')) || 'fallback' 的回退文案，不应直接标为 Critical，除非确认所有目标 locale 均缺失且 UI 无回退。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 936
File: memory/persona.py:906-927
Timestamp: 2026-04-23T05:45:31.048Z
Learning: Repo: Project-N-E-K-O/N.E.K.O
File: memory/persona.py (amerge_into, EVT_PERSONA_ENTRY_UPDATED payload)
PR: `#936`
Learning: `EVT_PERSONA_ENTRY_UPDATED` 的事件 payload 故意不包含 `merged_text`（即合并后的文本明文）。这是 RFC §3.3.6 + 红线 4 的硬约束："事件日志严禁包含原文 / 用户隐私"。promote 路径合并的是反思文本，反思由对话提炼而来，写入 payload 会使事件日志成为对话原文副本。崩溃恢复路径选择 operator-intervention（§3.3.6 末段 + §6 OQ 9）：handler raise → reconciler 暂停 sentinel → 操作员审计，而非自动自愈。不要再建议把 `merged_text` / `text` 写进该事件的 payload。对应测试：`test_reconciler_text_drift_raises_per_rfc_red_line`（断言 handler raise + view 不被自动改写）。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 0
File: :0-0
Timestamp: 2026-04-26T14:09:46.956Z
Learning: Repo: Project-N-E-K-O/N.E.K.O
File: main_logic/core.py (handle_response_discarded, recovery path ~line 823)
PR: `#967`, commit 6d75ecfb

In the RESPONSE_LENGTH_TRUNCATED / RESPONSE_TOO_LONG recovery path inside `handle_response_discarded`:
1. MUST check `_pending_turn_meta` for `kind == "avatar_interaction"` before appending to `_conversation_history`. If the turn was an avatar_interaction ephemeral (`prompt_ephemeral(persist_response=False)`), skip `_conversation_history.append(AIMessage)` entirely — appending would pollute future context with a response that was never meant to persist.
2. MUST consume `_pending_turn_meta` consistently with `handle_response_complete` L656-660: attach it to `turn_end_msg["meta"]` then clear it. Both WS and `sync_message_queue` must carry the meta field. Failure to consume leaves meta dangling, which causes it to be mis-consumed by the next turn's turn-end.
3. Non-avatar_interaction paths (normal over-length / proactive) MUST still append a placeholder AIMessage to conversation history — this is the semantic backing for the RESPONSE_TOO_LONG "回复太长了" locale string so users understand why the reply was lost.
Do NOT skip turn meta consumption or unify all paths to "always append" / "never append" without checking the kind field.

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 0
File: :0-0
Timestamp: 2026-04-30T20:35:09.001Z
Learning: Repo: Project-N-E-K-O/N.E.K.O
Files: plugin/plugins/sts2_autoplay/strategies/necrobinder.md, regent.md, silent_hunter.md + strategy.py (lines ~721-726)
PR: `#1011`, commit a0d56705
Learning: `strategy.py` 的约束评分逻辑（~line 721-726）直接消费策略 Markdown 文件的 YAML frontmatter，而不是正文描述。因此若正文写了"如果…才推荐"的条件限定，frontmatter 中对应条目必须放在 `conditional` bucket 而非 `high_priority`/`low_priority`。三个具体例子：necrobinder.md 的 `异蛇之颅`、regent.md 的 `天鹅绒项圈`、silent_hunter.md 的 `巨口储蓄罐` 均已移至 `conditional`。审查策略文档时，若正文有条件限定但 frontmatter 是无条件 bucket，需指出不一致。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 1225
File: main_logic/core.py:590-592
Timestamp: 2026-05-07T20:06:39.148Z
Learning: 在 Project-N-E-K-O/N.E.K.O 的 main_logic/core.py 中，voice echo suppression 的缓存（_recent_ai_voice_echo_text 与 _recent_ai_voice_echo_at）需在会话边界重置。当前实现已在 LLMSessionManager.start_session() 与 LLMSessionManager.end_session() 中调用 _reset_voice_echo_suppression_cache()，避免跨 session 的语音转写被误判为回声。今后评审中不再将“跨会话未清空 echo 缓存”视为问题。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 1102
File: utils/file_utils.py:123-136
Timestamp: 2026-05-03T09:33:08.587Z
Learning: In `utils/file_utils.py`, the `_strip_stray_chars_between_tokens` function deliberately caps pollution grapheme cluster stripping at 2 clusters (`for _ in range(2)`). This limit was explicitly decided during review: "1-3 太多，1-2，best-effort 最少破坏". The design philosophy is "minimum necessary recovery" — failing on 3+ cluster pollution (letting `json.loads` raise `JSONDecodeError`) is safer than greedy stripping which increases silent data corruption risk. Do not suggest raising this cap to 3 or higher.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e23916532

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T13:12:10Z

+                    await self._enforce_hard_cap(lanlan_name)
+                    # best-effort：通知上层起一个受保护的后台压缩任务尽力压（主路径失败）。
+                    await self._notify_compress_done(on_compress_done, lanlan_name, snapshot, False, detailed)


Run backup compression before pruning the snapshot

When a single main compression attempt fails after recent history is already above RECENT_HARD_CAP_TOKENS, this prunes the oldest messages before scheduling the background fallback. The fallback then compresses the pre-prune snapshot, but merge_backup_memo() requires that whole snapshot to still be present at the head of current history, so a later successful backup becomes moot and the pruned conversation is permanently lost instead of being summarized. This turns the intended final fallback for sustained failures into data loss on any transient failure once the cap is exceeded.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T13:12:10Z

+                if kept and kept_tok + mtok > RECENT_HARD_CAP_TOKENS and len(kept) >= self.max_history_length:
+                    break
+                kept.append(msg)
+                kept_tok += mtok


Allow the hard cap to trim oversized recent turns

When the newest max_history_length messages alone exceed RECENT_HARD_CAP_TOKENS (for example a few large pasted turns while compression is failing), this loop still appends them all because it refuses to stop until len(kept) >= self.max_history_length. The resulting new_history can remain above the hard cap, and /new_dialog later injects those full message contents, so the fallback still does not bound prompt size in exactly the long-message failure case it is meant to protect.

Useful? React with 👍 / 👎.

Codex 指出：把裁剪放在主路径**每次**压缩失败时，会让"历史超 cap 后任何一次暂时性失败"立刻丢最旧原文，而后台压缩用的是裁剪前 snapshot → 合并失配 moot → 那批对话没被摘要就永久丢失，把"持续失败的最终兜底"变成"暂时失败就丢数据"。 - recent.py：update_history 失败分支不再裁剪；_enforce_hard_cap 改名为 enforce_hard_cap（供 memory_server 调用）+ 自包含落盘（不再依赖 update_history 的后续落盘）。 - memory_server：裁剪改由 _run_backup_compress 在后台 best-effort 也压不成后触发（锁内串行写），以及 dead-letter 分支触发（不加锁防与 settle lock 重入死锁）。暂时性失败时后台会成功、根本走不到裁剪，那批对话被压成摘要保留。 - 测试：补 enforce_hard_cap AsyncMock + 断言后台失败 / dead-letter 时裁剪被触发。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wehos · 2026-06-04T13:20:16Z

@chatgpt-codex-connector 复评的两条（已推 832a058c7）：

裁剪在剪掉 snapshot 前抢跑、暂时失败也丢数据 — ✅ 修复，这条抓得很准。原来把 enforce_hard_cap 放在主路径每次压缩失败时，一旦历史已超 cap，任何一次暂时性 429 都会立刻丢最旧原文，而后台压缩用的是裁剪前 snapshot → merge_backup_memo 失配 moot → 那批没被摘要就永久丢了。现已改为：
- update_history 失败分支只起后台、不裁剪；
- 裁剪改由后台 _run_backup_compress 在 best-effort 也压不成之后触发（锁内），以及 dead-letter 分支触发；
- 这样暂时性失败时后台会压成功、根本走不到裁剪，那批对话被压成摘要保留——真正做到"持续失败才丢"。
近期 max_history_length 条本身超 cap 时仍不 trim — 这是 PR 拍板的有意边界：硬约束是"不丢近期对话原文"，所以 _trim 保证至少留最新 N 条、绝不删近期原文。当这 N 条本身（几条大粘贴）就顶破 cap 时，兜底无法再 bound——但"截/丢近期原文"违反该约束。单条/少量超长粘贴属于 config 里既有的"已知不 cap 项"（用户原话直接拼 prompt），由 per-message 层面处理，不在这个"防条数累积膨胀"兜底的范围内。故保持现状。

chatgpt-codex-connector · 2026-06-04T13:24:59Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-code-quality Bot found potential problems Jun 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

		async with _get_settle_lock(lanlan_name):
		status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0])

Conversation

wehos commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

背景

方案（best effort → 实在不行才丢）

改动

测试

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wehos commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

wehos commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wehos commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading